January 03, 2021
It is also part of Data Preprocessing. Outlier is an object that deviates significantly from rest of the objects which causes major change in the imputations or calculation. The analysis of Outlier Data is knowns as Outlier Analysis.
Common Causes of Outlier
A dataset (1,3,5,7,9) has mean 5, whereas the dataset which has outlier (1,3,5,7,14) has mean 6. Here the outlier is 14, which changes the mean itself. When the mean changes then the entire calculation based on mean will also change. This is the main reason for detecting and removing outliers.
R Package Outlier
We have inner fence and Outer fence in Boxplot. Check out the images below.
The data points which deviates from the inner and outer fence are called as outliers.
We detect the outlier and replace the outliers with NA and consider as missing value analysis. And then we will proceed with imputations.
R has a package which calculates Mean of the dataset and check each value and detect the value which is away from mean and consider it as Outlier.
Hope this post explains about Outlier and we will move with BoxPlot Outlier Analysis in R with our next post.